Skip to content

Conversation

@miantalha45
Copy link
Contributor

@miantalha45 miantalha45 commented Feb 7, 2026

Description

Add automatic reconnection to the IoTDB CLI when the connection to the server is lost during an interactive session (e.g. server restart, network blip, or idle timeout). The CLI no longer exits immediately on connection-related errors; it attempts to reconnect with the same parameters and retries the failed command, aligning behavior with the Session API, JDBC, and C++/Python clients.

Content1 — Detection and reconnection flow

  • Detection: Connection loss is detected when a command fails with a connection-related SQLException. We treat an exception as connection-related if its message (or cause message, lowercased) contains any of: connection, refused, timeout, closed, reset, network, broken pipe. This logic lives in AbstractCli.isConnectionRelated(SQLException) and matchesConnectionFailure(String) so it can be shared and reused.
  • Reconnection: On such a failure, the CLI closes the current connection and opens a new one using the same parameters (host, port, user, password, and options) via DriverManager.getConnection and the existing info properties. Helper methods openConnection(), setupConnection(), and closeConnectionQuietly() in Cli encapsulate open/setup/close so the main loop stays clear.
  • Retry: After a successful reconnection, the same user command (the current line that failed) is retried with the new connection. We retry reconnection up to 3 times with a 1 s delay between attempts (no delay before the first attempt). Constants RECONNECT_RETRY_NUM and RECONNECT_RETRY_INTERVAL_MS in Cli control this; they are not yet user-configurable.
  • Feedback: On successful reconnection we print: Connection lost. Reconnected. Retrying command. If all reconnection attempts fail we print: IoTDB: Could not reconnect after 3 attempts. Please check that the server is running and try again. and exit with error code.

Content2 — Class and method organization

  • AbstractCli: Added isConnectionRelated(SQLException) (package-private static) and matchesConnectionFailure(String) (private static) for shared detection. In executeQuery, setTimeZone, and showTimeZone, we catch SQLException (or Exception where the API does not throw SQLException) and rethrow when isConnectionRelated(e); otherwise we keep the existing "print error and return error code" behavior. handleInputCmd and processCommand now declare throws SQLException so connection failures propagate to the CLI loop instead of being swallowed.
  • Cli: Introduced ReadLineResult (inner class with stop, failedCommand) and factory methods continueLoop(), stopLoop(), reconnectAndRetry(String) so the read-eval loop can signal "continue", "exit", or "reconnect and retry this command". receiveCommands() no longer uses try-with-resources for the connection; it holds the connection in a variable, and when readerReadLine() returns a result with failedCommand != null, it runs the reconnect loop (close → retry open/setup → print message → retry command). readerReadLine() wraps processCommand() in a try-catch; on connection-related SQLException it returns reconnectAndRetry(s) with the current line; on other SQLException it prints and returns stopLoop().
  • AbstractCliTest: testHandleInputInputCmd() now declares throws SQLException and imports java.sql.SQLException so it compiles with the updated handleInputCmd signature.

Content3 — Corner cases and alternatives

  • Corner cases: If reconnection succeeds but the retried command fails again with a connection-related error, the outer loop will see another reconnectAndRetry and run the same reconnect/retry flow again (each time with up to 3 reconnect attempts). Non-connection SQLExceptions still print the error and stop the loop (exit) as before. Interrupt and EOF handling in readerReadLine() are unchanged.
  • Session/statement errors after reconnect: If the server returns an error such as "StatementId doesn't exist in this session" (e.g. after a server stop/start), the retried command or the next user command can fail with that instead of a connection error. We treat such exceptions as session/statement state errors via isSessionOrStatementError() in AbstractCli and show: "Reconnected, but the previous command could not be completed. Please run your command again." so the user is not shown the raw exception. This handling is applied both in the reconnect-retry path in Cli and in AbstractCli.executeQuery for the normal command path.
  • Alternatives considered: (1) Reconnect without retrying the failed command—simpler but worse UX. (2) Prompt "Reconnect? (y/n)"—gives control but adds friction and is less script-friendly. (3) Leave current behavior—rejected to align CLI with other clients and improve long-lived session UX.

This PR has:

  • been self-reviewed.
    • concurrent read
    • concurrent write
    • concurrent read and write
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods.
  • added or updated version, license, or notice information
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage.
  • added integration tests.
  • been tested in a test IoTDB cluster.

Key changed/added classes (or packages if there are too many classes) in this PR
  • org.apache.iotdb.cli.AbstractCliisConnectionRelated, matchesConnectionFailure, isSessionOrStatementError, matchesSessionOrStatementFailure; rethrow connection-related SQLException in executeQuery, setTimeZone, showTimeZone; in executeQuery show friendly message for session/statement errors; handleInputCmd, processCommand now throws SQLException
  • org.apache.iotdb.cli.CliReadLineResult, openConnection(), setupConnection(), closeConnectionQuietly(); refactored receiveCommands() and readerReadLine() for reconnect-and-retry flow; handle session/statement error in reconnect-retry path
  • org.apache.iotdb.cli.AbstractCliTesttestHandleInputInputCmd() updated for throws SQLException

Closes #17179

- Detect connection-related SQLExceptions (refused, timeout, closed, reset, etc.)
- In AbstractCli: rethrow connection-related SQLException from executeQuery,
  setTimeZone, showTimeZone so CLI can handle them
- In Cli: on connection loss, close current connection, retry reconnect up to 3
  times with 1s interval, then retry the failed command; print 'Connection lost.
  Reconnected. Retrying command.' on success; exit with clear message after
  all retries fail
- Add isConnectionRelated() and matchesConnectionFailure() in AbstractCli for
  shared detection; openConnection(), setupConnection(), closeConnectionQuietly()
  and ReadLineResult in Cli for reconnect flow
- Update AbstractCliTest to declare throws SQLException for handleInputCmd calls

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, this is your first pull request in IoTDB project. Thanks for your contribution! IoTDB will be better because of you.

When reconnect succeeds but the retried command fails with a session/statement
error (e.g. StatementId doesn't exist in this session), show a friendly message
instead of the raw exception. Apply the same handling in AbstractCli.executeQuery
so the message is shown both during reconnect-retry and when the user runs the
next command. Add isSessionOrStatement
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds automatic reconnection + retry behavior to the IoTDB CLI interactive session when connection-related failures occur, aligning CLI behavior with other IoTDB clients.

Changes:

  • Introduces shared connection-failure/session-state error detection in AbstractCli and propagates connection-related SQLExceptions to the main loop.
  • Refactors Cli.receiveCommands() / readerReadLine() to attempt reconnect (bounded retries + backoff) and retry the failed input line.
  • Updates AbstractCliTest for the new throws SQLException method signatures.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
iotdb-client/cli/src/main/java/org/apache/iotdb/cli/Cli.java Implements reconnect/retry loop, connection open/setup/close helpers, and new read-loop result signaling.
iotdb-client/cli/src/main/java/org/apache/iotdb/cli/AbstractCli.java Adds connection/session error classifiers and changes command execution to rethrow connection-related SQLExceptions.
iotdb-client/cli/src/test/java/org/apache/iotdb/cli/AbstractCliTest.java Adjusts unit test compilation due to updated method signatures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +277 to +283
try {
connection = openConnection();
setupConnection(connection);
ctx.getPrinter().println("Connection lost. Reconnected. Retrying command.");
processCommand(ctx, result.failedCommand, connection);
reconnected = true;
break;
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reconnect retry try { connection = openConnection(); setupConnection(connection); ... } can throw TException from setupConnection(), but the retry loop only catches SQLException. This means some connection-loss cases can bypass the reconnect logic and bubble out immediately. Suggestion: include TException (or a common supertype) in the reconnect-attempt catch and handle it as a reconnect failure (with the same retry/backoff), while still distinguishing non-connection SQL errors from command execution.

Copilot uses AI. Check for mistakes.
if (isSessionOrStatementError(e)) {
ctx.getPrinter()
.println(
"Reconnected, but the previous command could not be completed. Please run your command again.");
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

executeQuery prints "Reconnected, but the previous command could not be completed..." for session/statement-state errors, but this method cannot know whether a reconnect actually happened. If the same server-side error occurs without any reconnect, this message is misleading. Suggestion: make the message not assume reconnection (e.g., "Session state was reset..."), or only print this in the reconnect-and-retry path where reconnection is known to have occurred.

Suggested change
"Reconnected, but the previous command could not be completed. Please run your command again.");
"Session state was reset and the previous command could not be completed. Please run your command again.");

Copilot uses AI. Check for mistakes.
Comment on lines +325 to +328
} catch (SQLException e) {
if (isConnectionRelated(e)) {
return ReadLineResult.reconnectAndRetry(s);
}
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reconnection trigger/loop is a significant behavior change but there are no unit tests exercising it (e.g., simulating a connection-related SQLException from processCommand, verifying retry count/backoff, and ensuring non-connection SQLExceptions do not trigger reconnect or an incorrect "Could not reconnect" exit). Consider adding focused tests around this new flow to prevent regressions.

Copilot uses AI. Check for mistakes.
Comment on lines +280 to +283
ctx.getPrinter().println("Connection lost. Reconnected. Retrying command.");
processCommand(ctx, result.failedCommand, connection);
reconnected = true;
break;
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the reconnect-and-retry path, the return value of processCommand(...) is ignored. If the retried input contains an exit/quit statement (or otherwise returns false), the CLI will continue the main loop instead of stopping as it normally would. Suggestion: capture the boolean return from processCommand and convert it into a result.stop / break out of the outer loop when it indicates the user requested exit.

Copilot uses AI. Check for mistakes.
Comment on lines +284 to +300
} catch (SQLException e) {
if (isSessionOrStatementError(e)) {
// Reconnect succeeded but retry failed due to session/statement state; ask user to
// run the command again.
ctx.getPrinter()
.println(
"Reconnected, but the previous command could not be completed. Please run your command again.");
reconnected = true;
break;
}
if (attempt == RECONNECT_RETRY_NUM) {
ctx.getErr()
.printf(
"%s: Could not reconnect after %d attempts. Please check that the server is running and try again.%n",
IOTDB_ERROR_PREFIX, RECONNECT_RETRY_NUM);
ctx.exit(CODE_ERROR);
}
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the reconnect loop, any SQLException thrown while retrying the failed command is treated as a reconnect failure. This can incorrectly trigger further reconnect attempts (and eventually exit with "Could not reconnect") for non-connection SQL errors (e.g., syntax/permission errors) that happen after the reconnect succeeds. Suggestion: only retry when isConnectionRelated(e) is true; otherwise report the SQL error from the retried command and continue the main loop (or stop) without attempting further reconnects.

Copilot uses AI. Check for mistakes.
Comment on lines +277 to +301
try {
connection = openConnection();
setupConnection(connection);
ctx.getPrinter().println("Connection lost. Reconnected. Retrying command.");
processCommand(ctx, result.failedCommand, connection);
reconnected = true;
break;
} catch (SQLException e) {
if (isSessionOrStatementError(e)) {
// Reconnect succeeded but retry failed due to session/statement state; ask user to
// run the command again.
ctx.getPrinter()
.println(
"Reconnected, but the previous command could not be completed. Please run your command again.");
reconnected = true;
break;
}
if (attempt == RECONNECT_RETRY_NUM) {
ctx.getErr()
.printf(
"%s: Could not reconnect after %d attempts. Please check that the server is running and try again.%n",
IOTDB_ERROR_PREFIX, RECONNECT_RETRY_NUM);
ctx.exit(CODE_ERROR);
}
}
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On reconnect attempts, if openConnection() succeeds but setupConnection() or processCommand() throws, the newly created connection is not closed before the next attempt (the variable is overwritten on the next iteration). This can leak sockets/sessions across retries. Suggestion: in the catch path, call closeConnectionQuietly(connection) and set connection = null before continuing to the next attempt.

Copilot uses AI. Check for mistakes.
private static void setupConnection(IoTDBConnection connection)
throws java.sql.SQLException, org.apache.thrift.TException {
connection.setQueryTimeout(queryTimeout);
properties = connection.getServerProperties();
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setupConnection() appends to AGGREGRATE_TIME_LIST every time a reconnect happens, but the list is static and never cleared/deduped. During repeated reconnects this will grow unbounded and may introduce duplicate entries. Suggestion: clear the list before adding, or change it to a Set (if ordering is not required).

Suggested change
properties = connection.getServerProperties();
properties = connection.getServerProperties();
AGGREGRATE_TIME_LIST.clear();

Copilot uses AI. Check for mistakes.
Comment on lines 319 to +328
s = ctx.getLineReader().readLine(cliPrefix + "> ", null);
boolean continues = processCommand(ctx, s, connection);
if (!continues) {
return true;
try {
boolean continues = processCommand(ctx, s, connection);
if (!continues) {
return ReadLineResult.stopLoop();
}
} catch (SQLException e) {
if (isConnectionRelated(e)) {
return ReadLineResult.reconnectAndRetry(s);
}
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a connection-related SQLException occurs, the CLI retries the entire raw input line s. If the line contains multiple statements separated by ;, some statements may already have executed successfully before the failure, and retrying the whole line can re-run those statements (duplicate writes / side effects). Suggestion: either only retry the specific statement that failed (track progress inside processCommand), or disable auto-retry for multi-statement input and ask the user to rerun manually.

Copilot uses AI. Check for mistakes.
closeConnectionQuietly(connection);
connection = null;
boolean reconnected = false;
for (int attempt = 1; attempt <= RECONNECT_RETRY_NUM; attempt++) {
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test is always true, because of this condition.
Test is always true, because of this condition.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature request] CLI should support automatic reconnection when connection is lost during interactive session

1 participant